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ABSTRACT 

A sample of eight quasars observed at high resolution and signal-to-noise is used to 
determine the probability distribution function (PDF), the power spectrum, and the 
correlation function of the transmitted flux in the Lya forest, in three redshift bins 
centered at z = 2.41, 3.00, and 3.89. All the results are presented in tabular form, with 
full error covariance matrices to allow for comparisons with any numerical simulations 
and with other data sets. The observations are compared with a numerical simulation 
of the Lya forest of a ACDM model with = 0.4, known to agree with other large-scale 
structure observational constraints. There is excellent agreement for the PDF, if the 
mean transmitted flux is adjusted to match the observations. A small difference between 
the observed and predicted PDF is found at high fluxes and low redshift, which may 
be due to the uncertain effects of fitting the spectral continuum. Using the numerical 
simulation, we show how the flux power spectrum can be used to recover the initial 
power spectrum of density fluctuations. From our sample of eight quasars, we measure 
the amplitude of the mass power spectrum to correspond to a linear variance per unit 
In A: of A2(A;) = 0.72 ±0.09 at k = 0.04( kms"^)"^ and z = 3, and the slope of the power 
spectrum near the same k to be Up = —2.55 it 0.10 (statistical error bars). The results 
are statistically consistent with Croft et al. , although our value for the rms fluctuation 
is lower by a factor 0.75. For the ACDM model we use, the implied primordial slope is 
n = 0.93 ± 0.10, and the normalization is erg = 0.68 + 1.16(0.95 - n) ± 0.04. 
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Subject headings: cosmology: observations — intergalactic medium — large-scale struc- 
ture of universe — quasars: absorption lines 

1. INTRODUCTION 

The study of the Lya forest has been making a transition toward unification with other methods 
of investigating the large-scale structure of the universe. Several analytical models had proposed 
that the structures formed at high redshift through gravitational collapse on a range of scales 
and gas densities could produce the Lya forest absorption lines (Bahcall &; Salpeter 1965; Arons 
1972; Rees 1986; Bond, Szalay, k Silk 1988; McGill 1990; Bi 1993; Miralda-Escude & Rees 1993). 
Detailed hydrodynamical simulations of the evolution of structure in a photoionized intergalactic 
medium (hereafter, IGM) in cold dark matter models have shown that the basic properties of the 
Lya absorption spectra (see Ranch 1998for a review) can indeed be understood by the evolving 
network of sheets, filaments, and halos characteristic of gravitational dynamics in cosmology (e.g., 
Cen et al. 1994; Zhang, Anninos, &; Norman 1995; Hernquist et al. 1996; Miralda-Escude et al. 
1996; Wadsley & Bond 1996; Zhang et al. 1997; Theuns et al. 1998). Progress with theoretical 
modeling of the intergalactic medium occurred at the same time as the first high resolution and 
signal-to-noise Lya forest spectra from the Keck telescope's HIRES instrument (Vogt et al. 1994) 
became available, and as the large transverse sizes (Bechtold et al. 1994; Dinshaw et al. 1994) 
and the small-scale smoothness (Smette et al. 1992, 1995) of the absorbers were discovered, which 
eliminated many of the alternative models. 

As the results of these simulations and the effects of numerical resolution, box size and ther- 
mal evolution of the IGM are better understood, one can start measuring parameters of the theory 

of large-scale structure from the observations of the Lya forest. Thus, the distribution and the 
mean of the transmitted flux constrain the density-temperature distribution of the ionized gas, and 
determine a parameter depending mainly on the baryon density and the intensity of the ionizing 
background (Ranch et al. 1997; Weinberg et al. 1997). The power spectrum of the Lya forest 
is closely related to the power spectrum of the mass fluctuations, allowing a measurement of the 
amplitude of these fluctuations at high redshift which gives important constraints for large-scale 
structure models (Croft et al. 1998, 1999; Weinberg et al. 1999; Croft, Hu, & Dave 1999). The 
Doppler parameter distribution of the absorption lines can be used to measure the mean tempera- 
ture at different densities (Schaye et al. 1999; Ricotti, Gnedin, & Shull 1999; Bryan & Machacek 
1999), which depends on the ionization history of the IGM (Hui Sz Gnedin 1997). 

A lot of these observational results are based on a direct measurement of the statistical prop- 
erties of the transmitted flux in the Lya forest, which is a one-dimensional random field depending 
on the density, temperature and velocity of the gas at every point along the line of sight. While 
a large body of observational data has been published following the more traditional method of 
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fitting the absorption as being due to discrete absorbers with Voigt profiles, the work using flux 
statistics was generally aimed at determining specific cosmological information and less focused 
on a general presentation of the observational results. In this paper, wc present the first detailed 
tabulation of the probability distribution function, power spectrum, and correlation function of the 
transmitted flux using the highest quality spectra currently available, and paying special attention 
to the calculation of error bars. The primary focus of this paper is on a clear presentation of the 
observational results, in a form that is useful for comparisons with future cosmological simulations 
and models, as well as other observations. With this aim, the statistics that we use are intended 
to contain the minimum amount of complexity and theoretical prejudice. We will also compare 
our observational results to the numerical simulation "LIO" in Miralda-Escude et al. (1996), and 
discuss the cosmological implications. We demonstrate how our results can be used to measure the 
primordial power spectrum by a method similar to Croft et al. (1999). Our data has much better 
resolution, ~ 0.1 A, so we can push the measurement of P{k) down to wavelengths A ~ lh~^ Mpc. 

In §2 we describe the data and the simulation that we use. In §3 we present the results 
for the mean and variance of the transmitted flux and discuss their implications for cosmological 
parameters. In §4 we present the probability distribution function of the transmitted flux and 
compare it to that of the numerical simulation. In §5 we present the results for the power spectrum 
of the transmitted flux, and compare them to the simulation. We then discuss the implications for 
the primordial power spectrum of the universe, from the results on scales that are large enough to 
make the Lya forest fluctuations be related to quasi-linear density fluctuations. In §6 we present 
the correlation function of the transmitted flux. Finally, the discussion and conclusions are given 
in §7. Appendix B describes the computation of the error bars. All the results in this paper are 
available in the website http://www.physics.upenn.edu/~jordi/lya. In addition to all the Tables 
included here, the website contains also the full error covariance matrices and other details that 
are too extensive to be published here. 

2. THE OBSERVATIONAL DATA AND THE SIMULATION 

In this section we describe the observational data set and the Lya forest simulation that we 

use. 

2.1. Description of the Observations 

We use a set of eight quasars with spectra that are fully resolved and have high signal-to- 
noise ratio. Seven of our quasars (Q2343-hl23, Q1442+293, Q1107+485, Q1425+604, Q1422+230, 
QOOOO-262, and Q2237-061) are the same as in Ranch et al. (1997), but we add KP 77: 1623+2653, 
one of the triplet of quasars described in Crotts &: Fang (1998). The pixel noise is typically less 
than 5% of the continuum flux level, and frequently as low as 1%. The velocity resolution is 6.6 
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kms ^ (FWHM) and the spectra are binned in 0.04 A pixels. 

Continua were fitted to the HIRES spectra using the IRAF Continuum task. SphneS or 
chebycheff polynomials were used. The order of the polynomials and the number of fitting regions 
strongly depended on the signal-to-noise in the spectra and the redshift. As a rule of thumb, the 
higher the S/N ratio and the weaker the absorption, the more fitting regions were used and the 
higher the order of a polynomial was taken (amounting to several tens of degrees of freedom for a 
typical case of S/N~ 30, z = 2.5 Lya forest spectrum). The regions between Lya and Ly/3 were 
cut into 2 to 4 pieces for fitting, which were re-joined after fitting to form a long spectrum. 

The continua were fitted to regions of the Lyman a forest deemed 'free of absorption lines' 
(as judged by eye). At best this is a problematic definition, as the perception of a stretch of 
spectrum as being free of absorption depends on the signal-to-noise ratio of the data, the average 
line density (and thus the redshift), the mean background absorption (i.e., the possibility of a 
constant absorption trough on top of individual absorption lines), and the spectral resolution. The 
last point does not pose a problem for the current data as absorption features with widths down 
to 6.6 kms~^ are resolved, and typical Lyman a line widths are much larger than that (e.g.. Ranch 
et al. 1992). The possibility of unrecognized large-scale absorption features in the absorption is a 
more serious concern. Determining the position and extent of line-free spectral regions by eye tends 
to mistake large shallow flux depressions as parts of the unabsorbed continuum, so the continuum 
would be systematically placed low, leading to an underestimate of the absorbed optical depth. 
Similar problems arise for the high redshift (z> 4) Lya forest where more than half of the flux 
is absorbed and the spectrum 'between the absorption lines' rarely recovers enough to reach the 
probable continuum level of the QSO. In such cases only a stiff polynomial (typically with 3 to 5 
degrees of freedom for the region from Lya to Ly/3) can be used. This lack of information caused by 
the sparsely sampled continuum can lead to local uncertainties in the continuum exceeding 5% , as 
can be shown by comparison with more easily flux-normalizeable, low resolution single order spectra 
of the same QSO. The S/N ratio, finally, may also lead to biases, in that, at higher redshifts, the 
apparently unabsorbed continuum portions between the absorption lines shrink in size and noise. 
As the final selection of fitting regions is done by eye, the shortness of the continuum portions 
could easily eliminate true contimium regions or introduce spurious ones in the sample of supposed 
continuum data points. This last effect does not produce global errors in the continuum level, but 
it can introduce local fluctuations. 

These uncertainties in the continuum fitting can affect the results reported in this paper on 
the flux distribution function, the power spectrum and the correlation function, and should be 
considered as a source of possible systematic errors in addition to quoted statistical errors of the 
results. This underscores the need to obtain spectra in the future with a good flux calibration, 
so that the continuum can be fitted with many fewer free parameters under the assumption of a 
smooth underlying continuum spectrum of the observed quasar Press, Rybicki, &; Schneider (1993). 

Figure 1 shows the redshift range covered by the useful part of each spectrum. The wave- 



-5- 




4000 



5000 



6000 



Fig. 1. — The useful redshift range covered by each quasar. The vertical axis is meaningless. The 
vertical dotted lines at z = 2.67 and z = 3.39 separate the data into the three redshift bins that we 
will use. 
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length regions of the spectra of these eight quasars that were included in the present analysis were 
selected as follows: first, only the regions between the Lya and Ly/3 emission wavelengths were 
included, excluding also an interval within 5 h^"^ Mpc of the quasar to avoid the proximity effect. 
Approximately half of the KP 77 spectrum has signal-to-noise ratio lower than any of the Ranch 
et al. (1997) data so this region was eliminated. Finally, damped Lya absorption lines and narrow 
lines known or suspected to be metal lines were removed, by eliminating small intervals of the 
spectra containing these absorption lines from the spectra. The detailed list of wavelength intervals 
eliminated in each quasar is available on the website mentioned above. The data preparation is 
described in more detail in Rauch et al. (1997) and Barlow &: Sargent (1997). 

2.2. Calculation of Errors 

We present in this paper the results on the flux distribution function (in greater detail than 
in Rauch et al. 1997), as well as the power spectrum and correlation function of the transmitted 
flux. All the observational results are given with error bars obtained using the bootstrap method 
(Press et al. 1992), including the full covariance matrices for all the results except the power 
spectrum. The bootstrap procedure consists of dividing the data into N segments and generating 
modified realizations of a statistic by randomly selecting N of the segments (with replacement). 
The dispersion in the bootstrap realizations approximates the error on the observed statistic. The 
method for computing the error bars is described in detail in Appendix B. We present only the 
diagonal elements of the error matrices in the tables here. The full covariance matrices can be 
obtained from the website mentioned above. Our intention is to allow detailed comparisons of 
these results with cosmological simulations and with other observational results. 

2.3. Description of the Simulation 

We compare the observations to the output of the Eulerian hydro dynamical simulation de- 
scribed in Miralda-Escude et al. (1996) (referred to as LIO in that paper). The cosmological model 
used is = 0.4, Qa = 0.6, h = 0.65, a% = 0.79, and a large-scale primordial power spectrum slope 
n = 0.95. This model is in agreement with the large set of observations of large-scale structure 
currently available (e.g., Wang et al. 1999). The box size of the simulation is lO/i"^ Mpc, and it 
contains 288^ cells. Lya spectra are computed for a large number of lines of sight along the box 
axes. There is one free parameter that we can vary when computing the spectra, the normalization 
of the optical depth, which we adjust to reproduce the observed transmitted flux, as we show in §3. 
Renormalizing the optical depth is equivalent to modifying the intensity of the ionizing background, 
as long as the effect of collisional ionization and the change in the gas temperature caused by the 
different heating rate can be neglected (see Theuns et al. 1998for a test that these effects are in 
fact negligible; notice that collisional ionization is actually important in high temperature gas in 
the simulations, but this gas is always at high density and produces saturated absorption in Lya 
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). The optical depth is then mapped to transmitted flux using F = exp(— r). 

These theoretical spectra are then modified to account for the continuum fitting, resolution and 
noise in the observations. Of these three effects, the most difficult to reproduce is the operation of 
continuum fitting, where several points along the spectrum of the quasars that appear not to have 
any obvious absorption are selected to indicate the quasar continuum. This operation is inevitable 
if the intrinsic spectrum of the quasar is unknown. Unfortunately, there is not really a good way 
to correct for it because the spectra obtained from simulations obviously have the period of the 
simulated box, which is comparable to the typical distance between successive continuum fitting 
points. 

Following Ranch et al. (1997), we estimate the effects of continuum fitting by defining the 
maximum transmitted flux along any line of sight in the simulation (parallel to one of the three 
axes) to be the continuum flux, Fc, and dividing the flux in all other pixels by Fc- We convolve 
with the instrumental resolution of 6.6kms~^, and wc add Gaussian noise to each cell with the 
dispersion n{F), which is given below in Table 3. The dispersion n{F) is computed as the mean 
noise in all pixels in the observed spectra for every flux bin. Notice that the noise increases with 
the flux value because there is a constant background noise plus the Poisson noise of the photon 
counts in each pixel. We also map the 288 cells along an axis in the simulation onto 512 pixels in 
the spectra, to facilitate the computation of Fourier transforms. The noise in Table 3 is for the pixel 
width in the observational data, 0.04 A. When we compute the power spectrum and correlation 
function this noise level is multiplied by (0.04 A/ AXsim)^^"^ where AXsim is the wavelength extent 
of the pixels in the simulated spectra. 

In Ranch et al. (1997), an additional correction to the observed spectra was applied to correct 
for the redshift evolution within a redshift bin before a comparison is made with a simulation at 
a given redshift Zj. This consisted of multiplying the optical depth in pixels within a redshift bin 
around Zi by the factor [(1 + z)/{l + Zi)]'^'^, so that all the observed pixels are corrected to a 
redshift Zi according to a law that approximately fits the observed evolution. We do not apply this 
correction here. The correction is small for the redshift bins we shall use; nevertheless, it must 
be born in mind that our observational results for the flux distribution and power spectrum are 
averages over each of our redshift bins. The reason we do not include this correction is that it 
changes the statistical distribution of the pixel noise in a way that biases some of the results. Not 
including the correction also maJies it easier to compare to any simulation in an accurate way: after 
the continuum fitting, resolution and noise are included in simulated spectra, one can average the 
quantity being compared over the same redshift bins used here. 



3. THE MEAN AND VARIANCE OF THE TRANSMITTED FLUX 



In this section we present results for the mean and variance of the transmitted flux, with error 
bars calculated using the resampling method described in Appendix B. We then revisit the value 
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of derived in Rauch et al. (1997). 

The mean transmitted flux F is computed by averaging over all the pixels in a certain data 
subset (either a given quasar or a redshift bin) . Figure 2 shows the mean transmitted flux with error 
bars for each quasar. Figure 3 shows the variance of the flux for each quasar, cr^ = ((F — -F)^) — 
'^noise- '^^^ variance contributed by the pixel noise, crnoise^ generally negligible. The pixel noise 
does not significantly affect the error bars on the mean and variance of the transmitted flux, which 
are dominated by the fluctuations in the number of absorbers in the spectra. We notice that 
the increase of aj^ with redshift is due to the increasing mean flux decrement (the true density 
fluctuations are of course decreasing with redshift due to gravitational clustering evolution). 

The results for F and cr|. are presented in three redshift bins, centered at z = 2.41, z = 3.00, 
and z = 3.89, in Table 1. Each of these bins contains about one third of the data. We will use 
the same redshift bins for presenting observational results in the rest of the paper. The decrease in 
fluctuations with increasing F accounts for the smaller error bar on the mean transmitted flux at 
z = 2.41. 



3.1. Cosmological Implications of the Mean Flux Decrement Revisited 

The value of the mean transmitted flux is related to the parameter: 



0.0125 J 



lOOkms-^Mpc"^ 



1 



-12 



(1) 



where F = 10~^^F_i2S~^ is the photoionization rate due to the cosmic ionizing background. The 
constant ^ includes the simple dependences on cosmological parameters that arise from photoion- 
ization equilibrium: if the spatial distribution of the ovcrdcnsity, temperature and peculiar velocity 
of the gas is not altered, the optical depth at every pixel is proportional to jj, when O;,, Hq and F 
are varied. 

Table 2 presents the value of /j, required for the mean transmitted flux, predicted directly from 
our simulation, to match the observed one, with and without the continuum fitting correction. The 
values are given for the same three redshift bins used before (see Table 1). We derive fi using two 
simulation outputs for two of the redshift bins of the observational data. We shall see in §5 that 
comparing the data to the simulation outputs at different redshifts is approximately equivalent to 
varying the amplitude of the power spectrum in the model, and that the two simulation redshifts 
used in Table 2 bracket the amplitude that is inferred from our data. In addition to the amplitude 
of the power spectrum, the simulation outputs at different redshifts also differ on the distribution 
of temperatures. The mean temperature at overdensity of unity is given in the last column of Table 
2). The differences in the derived fi for different simulation outputs in Table 2 are due to both the 
different temperatures and the declining fluctuation amplitude with redshift. 

The values of /x we find are very similar to those in Rauch et al. (1997); the slight differences 
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Fig. 2. — Mean transmitted flux, F = (F), for each quasar in the sample. The error bars in the z 
direction show the redshift range covered by each spectrum while the point shows the mean redshift. 
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Fig. 3. — Variance of the transmitted flux, ap = {{F — F)^) — cr^^g^, for each quasar in the 
sample. The error bars in the z direction show the redshift range covered by each spectrum while 
the point shows the mean redshift. The variance decreases with decreasing redshift because the 
mean transmitted flux is increasing. 
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Table 1. Mean and variance of the transmitted flux. 





^max 


z F aj, 


{6F6ap) pixels 


3.39 


4.43 


3.89 0.475 ±0.021 0.1293 ± 0.0030 


-1.0 X IQ-^ 34320 


2.67 


3.39 


3.00 0.684 ±0.023 0.1174 ± 0.0056 


-1.1 X 10"'' 31897 


2.09 


2.67 


2.41 0.818 ±0.012 0.0789 ± 0.0068 


-7.7 X 10-5 33791 


Note. 


— The mean flux decrement F, the flux variance a^, their error correlation, and the total 


number of data pixels are listed for each redshift bin {zmin-, Zmax), with mean redshift z. 




Table 2. The optical depth normahzation /j, oc {Q.})h?)'^{H{z) r_i2) ^ 


^obs 




IJ, (cc) // (ncc) r_i2 










(K) 


2.41 


2 


1.58 ±0.20 1.51 ±0.19 0.545 ±0.070 


0.0257 ±0.0017 13100 


2.41 


3 


1.36 ±0.17 1.30 ±0.16 0.630 ±0.082 


0.0239 ± 0.0016 16000 


3.00 


3 


1.66 ± 0.28 1.51 ± 0.23 0.412 ± 0.068 


0.0296 ± 0.0024 16000 


3.00 


4 


1.34 ±0.21 1.20 ±0.18 0.514 ±0.082 


0.0266 ± 0.0021 14100 


3.89 


4 


1.44 ± 0.17 1.19 ± 0.13 0.356 ± 0.043 


0.0319 ± 0.0019 14100 



Note. — Computed for the observational redshift bin Zobs by comparison to the simulation at 
Zsim, with (cc) and without (ncc) the continuum correction. We give r_i2 assuming Qbh^ = 0.019, 
and 0,bh^ is given assuming r_i2 = 1, both including the continuum correction. The median 
temperature at the mean density for the appropriate simulation output is given as To^sim- 
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we find when comparing to the same simulation output are due to the inclusion of a new quasar in 
our sample, the different choice of redshift intervals, and differences in the method of analysis (such 
as not including the redshift correction to the center of each redshift bin). We give also in Table 2 
the statistical error bars due to the sample variance. In addition, we show the value of Qbh'^ when 
r_i2 = 1, and the value of r_i2 when rifjh^ = 0.019, fixing H{z) to the model h = 0.65, f^o = 0.4, 
Aq = 0.6. We recall here that the observed abundance of quasars provides a lower limit r_i2 > 1 
at z = 3, and therefore a corresponding lower limit to ^jji^ , which is a bit high compared to the 
value derived from nucleosynthesis and the deuterium abundance of Buries &; Tytler (1998) (see 
Ranch et al. 1997; Weinberg et al. 1997). 

In addition to the jU parameter, how should the predicted mean transmitted flux depend on 
the specific large-scale structure model that is assumed? As we shall discuss later in §7, the dom- 
inant dependence should be on the gas temperature-density relation, and the density fluctuation 
amplitude at the Jeans scale (Hui k, Gnedin 1997; Nusser &; Haehnelt 1999see, e.g.,). Rescaling the 
gas temperature Tq at a fixed overdensity will result in a variation jx cx T^-^ for a fixed mean trans- 
mitted flux, owing to the variation of the recombination coefficient with temperature. Changing 
the temperature will also change the distribution of absorption, for a fixed real space distribution 
of gas, by altering the thermal broadening. In general, increasing the thermal broadening will 
increase the amount of absorption and so decrease jJL for a fixed mean flux decrement. The full 
dependence on the gas temperature (which is affected by radiative cooling, shock heating, etc.) is 
more complicated; however, these two effects incorporate most of the temperature dependence. 

The fluctuation amplitude at the Jeans scale in the simulation we use is probably close to the 
correct value in our Universe given the good agreement we flnd later (§4 and 5) in the transmitted 
flux distribution function and the power spectrum. This limits the model uncertainty in the value 
of ^ derived from the simulation we use. However, the dependence on the density distribution is 
strong because the density of neutral gas is proportional to the square of the baryon density. 

The different values of /x derived from different redshift outputs give an idea of the importance 
of the model dependence on the gas temperature and the power spectrum amplitude. A more 
detailed analysis using simulations where these parameters are varied will be necessary to assess 
the errors due to the theoretical model uncertainty on the value of jjl. 

4. THE PROBABILITY DISTRIBUTION OF THE TRANSMITTED FLUX 

We present here the probability distribution function (hereafter, PDF) of the transmitted flux, 
flrst used as a tool to study the Lya forest by Jenkins &; Ostriker (1991). The PDF of the same 
observations used here (except for our addition of an eighth quasar in the sample) was presented 
before in Ranch et al. (1997). The results will be given here in differential form and with error 
bars computed as described in Appendix A. The PDF was measured from the observations with 
all pixels weighted equally. Table 3 gives P{F) and the noise amplitude n{F), which is necessary 
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Table 3. The observed probability distribution of the transmitted flux. 



F 


P{F) {z = 3.89) 


n{F) 


P{F) {z = 3.00) 


n{F) 


P{F) {z = 2.41) 


n{F) 


0.00 


3.618 ±0.288 


0.024 


2.032 ± 0.229 


0.0064 


0.744 ± 0.129 


0.017 


0.05 


1.472 ± 0.103 


0.027 


0.463 ± 0.043 


0.0072 


0.327 ± 0.043 


0.022 


0.10 


0.666 ± 0.049 


0.026 


0.340 ± 0.036 


0.0077 


0.214 ±0.025 


0.022 


0.15 


0.528 ± 0.037 


0.027 


0.300 ± 0.028 


0.0081 


0.180 ± 0.020 


0.021 


0.20 


0.524 ±0.040 


0.027 


0.279 ± 0.027 


0.0086 


0.176 ±0.019 


0.022 


0.25 


0.536 ± 0.040 


0.027 


0.315 ± 0.034 


0.0086 


0.195 ±0.020 


0.022 


sn 




n 027 


n 907 + n 027 


0088 


n oflfi + n 020 


023 


0.35 


0.534 ± 0.039 


0.027 


0.322 ± 0.030 


0.0095 


0.189 ±0.020 


0.024 


0.40 


0.571 ± 0.036 


0.028 


0.319 ± 0.027 


0.0099 


0.212 ± 0.023 


0.023 


0.45 


0.549 ± 0.038 


0.029 


0.351 ± 0.028 


0.0102 


0.213 ± 0.020 


0.025 


0.50 


0.598 ± 0.038 


0.030 


0.372 ± 0.031 


0.0099 


0.249 ± 0.023 


0.025 


0.55 


0.623 ± 0.043 


0.030 


0.448 ± 0.041 


0.0101 


0.267 ± 0.024 


0.025 


0.60 


0.706 ± 0.048 


0.031 


0.494 ± 0.038 


0.0104 


0.277 ± 0.024 


0.025 


0.65 


0.776 ± 0.051 


0.031 


0.584 ± 0.043 


0.0107 


0.354 ± 0.029 


0.027 


0.70 


0.856 ± 0.056 


0.031 


0.688 ± 0.044 


0.0114 


0.381 ± 0.028 


0.026 


0.75 


0.973 ± 0.060 


0.032 


0.831 ± 0.059 


0.0118 


0.566 ± 0.038 


0.028 


0.80 


1.119 ±0.071 


0.032 


1.067 ±0.069 


0.0117 


0.774 ± 0.050 


0.027 


0.85 


1.235 ±0.081 


0.032 


1.570 ±0.094 


0.0121 


1.231 ±0.063 


0.029 


0.90 


1.302 ± 0.095 


0.032 


2.180 ±0.135 


0.0127 


2.288 ± 0.097 


0.030 


0.95 


1.200 ± 0.107 


0.032 


3.355 ± 0.169 


0.0130 


4.480 ± 0.143 


0.029 


1.00 


1.068 ± 0.128 


0.032 


3.392 ± 0.248 


0.0138 


6.477 ± 0.256 


0.031 



Note. — The flux PDF, P{F), and the average rms noise per pixel, n{F), are averaged over 
bins covering the flux range within AF = ±0.025 of the listed value of F. The flrst and last bins 
include the few additional points with F < —0.025 and F > 1.025, respectively. 
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for comparisons with theory. We define n{F) to be the average of ai over all the pixels in each 
flux bin, where cjj is the noise in each pixel. The noise increases with increasing F (note that the 
z = 3 data has significantly less noise than the other two rcdshift bins). We use 21 bins of width 
AF = 0.05 with the first centered on F = and the last on F = 1. Pixels with fiux greater (less) 
than F = 1.025 {F = —0.025) are included in the last (first) bin. We provide statistical error 
bars on the probabilities to facilitate comparisons with theory but it is important to recognize that 
these error bars are significantly correlated. If a statistic is computed from only these diagonal 
elements of the error matrix the distribution will be approximately twice as wide as the distribution 
of properly computed from the full error matrix. The error matrix is available in the website 
quoted in the introduction. 

Figures 4(a,b,c) show the PDF of the observations and the simulation. The mean transmitted 
flux in the simulation is matched to the observed mean. We compute a statistic (using the full 
error matrix) to compare the observations and simulation. The results are x^/^ = 1.1 (with u = 19, 
39% likelihood) for z = 3.89, x^/z/ = 1.7 (3% likelihood) for z = 3, and x^/^ = 4.4 (negligible 
likelihood) for z = 2.41. Figures 4(a,b,c) show the effect of our continuum fitting approximation on 
the simulated PDF. This correction is important to the agreement seen in the two higher redshift 
comparisons. 

The almost perfect agreement of the predicted flux distribution in the simulation and the 
observed distribution is impressive, and is one of the strong pieces of evidence in favor of the new 
Lya forest theory based on gravitational evolution of primordial fluctuations. We believe that even 
the small disagreement between the simulation and observations at z = 2.41 (Figure 4(a)) is due 
primarily to the imperfection of our continuum fitting approximation. Since the inclusion of the 
continuum fitting correction moves the predicted PDF in the direction of the observed one, as seen 
in Figure 4, it is plausible that the small remaining discrepancy is due to an underestimate of this 
correction. We have examined other possible reasons for this discrepancy, and they were all found 
to be not significant. One of these is that the noise that we include in the simulations (assumed 
to have a Gaussian distribution for a given fiux F, obtained from the average of the variance of 
all pixels with flux F in Table 3) does not adequately represent the true distribution function of 
the noise, which is actually a sum of Gaussians with the variance distribution of all the pixels 
with flux F in the observations. We tested this by computing the predicted P{F) given the actual 
distribution of noise, and found that the difference it makes is much smaller than the differences 
between prediction and observation at z = 2.41. Thus, the average noise values we give in Table 3 
should be sufficient to compare results of any other numerical simulations with the observational 
data presented here on P{F). Another effect is that observational results are obtained by averaging 
over a certain rcdshift interval, whereas the simulation predictions are for a fixed redshift. The 
importance of this effect was tested by creating sets of spectra where each line of sight had a 
different value for /i, given by /i = /7,o [(1 + z)/{l + z)]^'^, where /ig is the value of /i at the mean 
redshift and z is varied across the full width of the redshift bin in the observational data; there was 
negligible change in the predicted PDF of the fiux. 
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Fig. 4. — The PDF of F for the observations {histograms) and for the simulation with the continuum 
fitting approximation [filled points) and without it {open points). The small number of points 
outside the displayed range of F are included in the outermost bins. Errors bars were generated by 
bootstrap resampling. The numerical simulation has F fixed to agree with the observations, (a) 
shows z = 3.89, (b) shows z = 3.00, and (c) shows z = 2.41. 
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Why is there such a good agreement between the predicted P{F) and the observed one, after 
having adjusted only one parameter (the mean transmitted flux)? To understand the significance 
of this result, it is useful to think of the optical depth on a given pixel as being determined mostly 
by the gas density and temperature at a given point in space, r oc p^r~°-^ (from photoionization 
equilibrium). Given a p — T relation determined by the photoionization history (Hui &; Gnedin 
1997), the predicted P{F) should essentially be a result of gravitational evolution starting from 
Gaussian initial conditions, and it should mostly depend on one parameter only (in addition to the 
mean transmitted fiux): the amplitude of fluctuations on the Jeans scale. Obviously, the higher 
the amplitude of density fluctuations, the larger the dispersion in the transmitted flux should be. 
A weaker dependence on the p — T relation can also be expected. This suggests two possible 
implications of the good agreement of the prediction of P{F) with the observations, which will 
need to be examined further: (a) the Lya forest is indeed a result of gravitational evolution in a 
photoionized IGM starting from Gaussian initial conditions; (b) the amplitude of fluctuations in 
the ACDM model assumed in the simulation is close to the true value in the universe. 

In the next Section, our analysis of the power spcctrTini will show that the amplitude of fluc- 
tuations in the simulation we use should actually be reduced by ~ 15% to match the observations. 
However, the finite size of the simulated box reduces the efl"ectivc power, resulting by chance in a 
value of the variance of the transmitted flux that matches very well the observations (see §6). 



In this section we compute the one dimensional power spectrum of the transmitted flux, Pp(A:). 
The flux power spectrum is the most straightforward two-point statistic that can be measured 
from the data. Hopefully this will make the results more generally useful for comparisons with 
analytic theory, numerical simulations, and other observations. We shall then study the relation 
of the flux power spectrum to the linear mass power spectrum using the numerical simulation, 
on scales that are large enough to make the fluctuations in Lya absorption be related to linear 
density fluctuations. This relation in the large-scale limit is further discussed in Appendix C. 
Although there are theoretical reasons to Gaussianize the transmitted flux before obtaining the 
power spectrum (Croft et al. 1998) this operation can amplify the effect of the noise, and its merit 
in improving the recovery of the linear power spectrum has not been made clear. 

The data is given in the form of pixels with wavelength label Aj and flux value Fj. We measure 
distance between pixels in units of the local velocity scale using the formula 



where A = Aa(l -|- z) is the wavelength at the mean redshift, z, of any given data subset, and 
Arj is the comoving distance between pixel i and a pixel at the mean redshift, where we have 
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THE POWER SPECTRUM 
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assumed an Einstein-de Sitter universe for this calculation. With this formula, separations are 
precisely proportional to comoving distance for an Einstein-de Sitter universe, and are a close 
enough approximation for other cosmologies for the redshifts intervals we shall use. The power 
spectrum is then estimated from each spectrum using the Lomb periodogram code in Press et al. 
(1992). This algorithm is designed for use on unevenly sampled data (the uneven sampling in our 
case is due to the removal of the chunks in the spectra containing damped Lya and metal lines, 
and to the change in pixel size with redshift), and avoids rebinning of the data. The computed 
modes are averaged over bins evenly spaced in log(k). 

5.1. Results for the Observed Ppik) 

Table 4 lists the results of the flux power spectrum measurements from the observational data, 
over the range 0.0025 (kms^^)^^ < k < 0.16( kms^^)^-*^ (the power at k < 0.0025( kms~"^)~-^ is 
possibly distorted by the continuum fitting operation, and at A: > 0.16(kms^^)^^ the power is 
strongly affected by narrow metal lines and other systematic errors that are discussed below) . Note 
that the velocity scales represent different comoving scales at each of the three redshifts. The error 
bars were determined as described in Appendix B; for the power spectrum they are approximately 
independent (as expected in linear theory). Our normalization convention is that the rms flux 
fluctuation is ap = J'^^{dk/2'K) Pp{k). 

The observed Ppik) is plotted in Figures 5(a,b,c) (crosses with error bars) for the three usual 
redshift bins. Notice that we plot the quantity kPplk), for easier visualization. We also plot the 
power spectrum measured from the simulation (solid line), with the mean flux decrement adjusted to 
match the observation in every redshift interval. There is generally very good qualitative agreement 
between the simulations and observations on large scales {k < 0.1(kms~"^)~^). On small scales 
{k > 0.3( kms~^)~^), the simulated power spectrum is constant due to the noise that we add, 
matching that in the observations. The diflFerent behavior of the observed Ppik) is due to various 
effects that we shall now describe. 

The measurement of the power spectrum is complicated by the possible presence of metal lines 
in the spectra. Because the metal lines are narrower than the Lya forest lines, they can affect the 
power spectrum on very small scales. The numbers listed in Table 4, shown as crosses in Figure 
5, exclude potentially contaminated regions, but some of these regions might contain genuine Lya 
lines that were selectively eliminated as suspected metal lines because they are narrow. The solid 
triangles in FigTuc 5 show the power spectrum with these regions included (except for a few regions 
that contain damped Lya systems or bad data points). We see that the suspected metal lines are 
significant in adding small scale power. This problem in the determination of the power spectrum 
occurs on the smallest scales only (starting with the fifteenth bin in Table 4). 

The removal of chunks of spectra suspected of containing metal lines changes the effective 
window function for the power spectrum measurement. To show the magnitude of this effect, we 
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Tablo 4. The observed power spectrum of the flux. 
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0.00713 


16.9 ±3.1 


10.8 ± 1.6 


11.4 ±1.9 


0.00795 


0.00898 


15.4 ±2.4 


10.1 ± 1.7 


10.2 ± 0.95 


0.01 


0.0113 


9.44 ±1.1 


9.08 ± 1.9 


7.5 ± 1.6 


0.0126 


0.0142 


8.26 ± 1.4 


8.07 ± 1 
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0.0179 
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Fig. 5. — The observed one dimensional power spectra computed with and without including the 
regions possibly contaminated by metal lines. The points with error bars show Ppik) computed 
after excluding the possibly contaminated regions while the triangles show the points computed 
from the complete spectra. The solid lines show the power spectra from the simulation with the 
mean transmitted flux adjusted to match the observations, (a) shows z = 3.89, (b) shows z = 3.00, 
and (c) shows z = 2.41. 
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Fig. 6. — The power spectrum measured from randomly generated spectra with input power spec- 
trum shown by the sohd curve (see Appendix A). The triangles show the power measured from the 
full spectrum, while the squares show the power measured after the regions that are suspect in the 
real data are removed. The power is increased on the smallest scales when the window function is 
modified by removing the chunks. Figure 5 shows the opposite effect because the removed regions 
in the real data contain narrow lines. The flattening of both sets of points on the smallest scales is 
because of the noise added to the random spectra. 
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have randomly generated Gaussian spectra with the one-dimensional, linear power spectrum shown 
by the solid line in Figure 6, with a total length in velocity equal to the size of the observational 
sample used here in each redshift interval. The procedure we have used for generating these mock 
spectra is described in detail in Appendix A. We have then measured the power spectrum with the 
same method used for the observations, with and without removing the same chunks of the spectra 
that contain metal lines in the real observations. The measured Ppik) points for the case where 
the chunks are not removed (solid triangles) appear to follow the input power spectrum (except for 
statistical fluctuations at small k), until the power is small enough that the added noise begins to 
dominate. When the chunks are removed (solid squares), the change in the window function results 
in increased power on small scales. In the observational data the removed chunks contain narrow 
lines which have a greater effect on the power spectrum than the change in the window function, 
so the power decreases when the chunks are removed, except for Zghs = 3.89 where the two effects 
approximately cancel. 

In the rest of §5, we do a quantitative comparison of the observed and simulated power spec- 
trum. We will not use the measurements on small scales where the effects of the metal lines and 
the window function are important. 



5.2. Fitting Formula for Ppik) 

The mass power spectrum of the linear density perturbations for cold dark matter models can 
be fitted by the form 

PsDiq) = A g-[ln(l + aig)Kg]^ ^ (3^ 

[1 -I- a2q + {asqy + {a4q)^ + {a^qY] 2 

where q = k/r{z), T{z) = (1 -|- z)Q,Qh? / H{z) (when k has units of inverse velocity), and A is 
a normalization constant. The subscript 3D is written here to remind us that this is the power 
spectrum for fluctuations in three-dimensional real space. The formula is given by Bardeen et al. 
(1986), but we modify the parameters to the fit for = 0.05: ai = 2.205, ^2 = 4.05, 03 = 18.3, 
04 = 8.725, and = 8.0 (Ma 1996). The amplitude parameter A has units of (kms~^)^. For the 
initial conditions of our simulation, n = 0.95, T{z) can be found using Oq = 0.4, Q.\ = 0.6, and 
h = 0.65, and then A can be found from as = 0.79. 

Croft et al. (1998) found that multiplying the three dimensional linear theory power spectrum 
by the smoothing function exp{—k'^wl), with Wc — 34kms~''^ at 2; = 3, matches the Lya forest 
power spectrum obtained in their simulations. We also find that this function fits the output of our 
simulation reasonably well down to remarkably small scales, although wc obtain smaller values of 
the 3D Gaussian cutoff: Wc — 12kms^"'^. It seems probable that, as Croft et al. (1999) speculated, 
the high Wc found by Croft et al. (1998) is a result of their lower simulation resolution. Their mean 
particle separation was close to 34kms~^. 
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Here, we will fit the one-dimensional flux power spectrum in the observations and simulation 
with the following formula: 

Ppik) = Ap "^P^:^'^'^ r dk' k'PMk') , (4) 

where Psoik) is given by equation (3). We will first obtain fits with the two free parameters Ap 
and Vc, and then add the slope n in §5.5. Equation (4) applies a Gaussian smoothing directly to 
the ID power spectrum, instead of smoothing in three dimensions and then converting to the one- 
dimensional spectrum (as done in Croft et al. 1999). We will find later (in Fig. 11) that equation 
(4) agrees well with the low-fc [< 0.04 ( kms^^)~^] fiux power spectrum of the simulation and, most 
importantly, the best fit for the slope parameter coincides with the real space linear theory slope 
predicted from the initial perturbations. While the 3D Gaussian smoothing provides a better fit 
for the power spectrum on very small scales than the ID Gaussian smoothing, we have found that 
the best fit parameters with the 3D smoothing do not return the correct value of the spectral index 
n of the model used in the simulation (the basic reason is that the effect of the 3D smoothing on 
Ppik) extends to larger scales than the ID smoothing). This is not inconsistent with Croft et al. 
(1998), because we extend the fit to smaller scales. 

In Appendix C, we give a justification of the fitting formula in eq. (4) in the limit of large 
scales, showing that the flux power spectrum must be proportional to the one-dimensional mass 
power spectrum (see also Scherrer Sz Weinberg 1998). 

Notice that in strict linear theory we should include the effects of peculiar velocities in the 
transformation from three to one dimensional power, which affect the shape of the power spectrum 
(Hui 1999; McDonald & Miralda-Escude 1999a). However, these linear theory results work only on 
extremely large scales, and in the range of scales we explore here (where the density fluctuations 
are not much smaller than unity), the higher-order effects are important and opposite to the linear 
effects. Like Croft et al. (1998), we find that the slope of the simulation power spectrum on large 
scales is consistent with the real space linear theory slope (or equivalently, the peculiar velocity 
parameter /? is quite small). 

We first fit equation (4) to the observations and simulations leaving as free parameters the 
amplitude A and the Gaussian cutoff Vc only. All the other parameters are held fixed at the values 
they have in the simulation in linear theory (we will let the spectral index n vary later, in §5.5). 
Table 5 gives the best fit values of these two parameters. For each simulation output (at redshift 
z = 2,3, 4) we perform three separate fits using the mean flux decrements F = 0.475, 0.684, and 
0.818. The result for the amplitude parameter is listed in terms of the contribution to the variance 
per unit interval of Ink, A'^j-){kp) = {27r'^)~^ kpPsoikp) [where Psoik) is given by eq. (3) with the 
value of A to flt Pp in eq. (4) ]. The value of kp is kp = 0.04 (kms"^)"^ at 2; = 3, and is held 
constant in comoving coordinates. The reason for this choice of kp will be made clear below, in 
§5.6. 

In order to perform the same flts to the flux power spectrum predicted by the simulation, 
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Table 5. Fitted parameter values for equation (4). 



Ppik) from: 


z 


F 




(kms-1) 




v 


obs. 


3.89 


0.475 


0.0392 ± 0.0024 


21.6 


1.2 


10 


obs. 


3.00 


0.684 


0.0370 ± 0.0021 


25.4 


0.77 


10 


obs. 


2.41 


0.818 


0.0321 ±0.0021 


28.8 


1.4 


9 


sim. 


4 


0.475 


0.0436 ± 0.0010 


21.9 


2.2 


5 


sim. 


4 


0.684 


0.0370 ± 0.0009 


20.6 


1.9 


5 


sim. 


4 


0.818 


0.0247 ± 0.0006 


19.2 


2.6 


5 


sim. 


3 


0.475 


0.0511 ± 0.0014 


23.8 


1.4 


5 


sim. 


3 


0.684 


0.0442 ± 0.0012 


23.0 


1.3 


5 


sim. 


3 


0.818 


0.0290 ± 0.0007 


21.5 


1.6 


5 


sim. 


2 


0.475 


0.0635 ±0.0019 


29.4 


0.36 


3 


sim. 


2 


0.684 


0.0544 ± 0.0018 


28.7 


0.26 


3 


sim. 


2 


0.818 


0.0347 ± 0.0012 


26.7 


0.11 


3 



Note. — The fitted power spectrum amplitude is given in terms of A|,(A;p) = (27r^) ^kpPpsDikp) 
where kp = 0.04 (kms~^)~^ at z = 3 (and is held constant in comoving coordinates). 
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we have generated approximate error bars by dividing the hnes of sight through the simulation 
box into twelve groups and measuring the dispersion among them. The groups are defined by 
the four quadrants of each of the three faces of the simulation cube. We expect that this is an 
underestimate of the actual errors that would be found if many simulation boxes were available 
because the twelve groups are not independent, but these error bars should give reasonable best 
values for fitted parameters. 

Even though the power spectrum of the optical depth should have a sharp cutoff at the thermal 
broadening scale, the transformation to transmitted flux creates small scale power, so the Gaussian 
cutoff cannot be expected to fit on small scales. We choose the maximum k values to use in the 
fits (kmax) by estimating where equation (4) begins to fit the simulation outputs poorly. We use 
kmax = 0.04 (kms^^)^^ for Zobs = 3 and Zobs = 3.89, and kmax = 0.032 (kms""*^)"^ for Zobs = 2.41. 
The cutoff kmax corresponds roughly to v^^ and decreases with time because Vc increases with time. 

The values of foi" the fits all have probabilities higher than 5% with the exception of the fit 
to the z = 4 output of the simulation with F = 0.818. The probability of exceeding its value for 
the worst fit is ~2%, but this is probably acceptable because our simulation error bars are likely 
to be slightly underestimated. 

5.3. Redshift Evolution of the Power Spectrum 

If the Lya forest is in fact governed by gravitational evolution, we know that the amplitude of 
the fluctuations should generally increase with time. In the linear regime, and when Cl{z) ~ 1 (a 
very good approximation in the model we use at z > 2), the amplitude should grow proportionally 
to the scale factor. However, when analyzing the flux power spectrum, its amplitude is affected by 
the mean flux decrement which is also evolving with redshift. 

The observed flux power spectrum at all three redshift bins is plotted as the symbols (pentagons 
at 2: = 4, squares at 2: = 3 and triangles at 2; = 2) in Figure 7. Error bars are shown at 2 = 4 only 
to avoid cluttering. The lines are analytic fits to the data points using equation (4) . These analytic 
fits will be used in detail below to measure the amplitude of the power spectrum at each redshift, 
but here we use them only to qualitatively visualize the redshift evolution. The flux power spectrum 
increases with redshift, contrary to the expected decrease of the mass power spectrum. The reason 
is the fast increase in the mean flux decrement with redshift. We can understand the effect of the 
mean flux decrement by considering the limiting case where all absorbers are optically thin; clearly, 
for fixed density fluctuations the amplitude of the flux fluctuations should grow proportionally to 
the flux decrement. 

To sec that the power spectrum decreases with redshift as expected when the effect of the 
flux decrement evolution is removed, we plot in Figure 8 the flux power spectrum of the numerical 
simulation at the redshift outputs z = 2,3,4, normalizing the optical depth to yield a fixed mean 
flux decrement F = 0.684 at all three redshifts. In the limit of large scales, the power is now 
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Fig. 7. — The observed one dimensional power spectrum of F along the line of sight for three 
redshift intervals. The points show the observed Ppik) values while the lines are analytic fits to 
the points. The redshifts z = 3.89, 3.0, and 2.41 are symbolized by pentagons and the solid line, 
squares and the dotted line, and triangles and the dashed line, respectively. The power is reduced 
with decreasing redshift because of the change in the mean flux decrement. The fits are obtained 
using only points at k smaller than the left vertical dotted line for z = 2.41, and the right vertical 
dotted line for z = 3.0 and z = 3.89. 
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Fig. 8. — The simulated flux power spectrum for the three redshift outputs with optical depths 
scaled to produce F = 0.684. The points show the values of Ppik) at the modes of the periodic 
simulated box, and the lines are analytic fits to the points. The redshifts z = A, 3, and 2 are 
symbolized by pentagons and the solid line, squares and the dotted line, and triangles and the 
dashed line, respectively. We see that the large scale power increases with time as expected when 
the mean flux decrement is fixed. The fits use only points to the left of the leftmost vertical dotted 
lines for z = 2, and the rightmost vertical dotted line for the other two redshifts. Note that the 
velocity scale corresponding to a given comoving scale is different for the three redshifts. 
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increasing as expected. But if the simulated spectra are normalized instead to the observed flux 
decrements at z = 2.41, 3.00 and 3.89, then they have an inverted evolution matching the observed 
one, as was shown in Figures 5a,b,c. 

We notice that the flux power spectrum still decreases more slowly than in Figure 8. The 
reason for this is more complex: in the limit of large scales, the flux power spectrum should generally 
be related to the mass power spectrum by a constant factor (see Appendix C), however this factor 
depends on the amplitude of fluctuations on the Jeans scale, which changes with time. The factor 
also depends more weakly on the gas density-temperature relation and other quantities which are 
changing with redshift. 

5.4. The Amplitude of the Observed Power Spectrum Relative to the Simulation 

We have already compared qualitatively the observed flux power spectrum observations and 
the numerical simulation in §5.1. Now we want to use this comparison more quantitatively to 
determine the amplitude of the primordial density perturbations required to fit the observations. 
The most straightforward method to implement this fit would be to have several simulations with 
different amplitudes for the power spectrum. However, since we have only one simulation, we 
shall instead use the three redshift outputs at z = 2, 3, 4 as being equivalent to the results of three 
different models at the same redshift, having amplitudes of the initial fiuctuations in the proportion 
3:4:5. This assumes a self-similarity in the evolution of the Lya forest, where reducing the initial 
amplitude of the power spectrum is equivalent to reducing the scale factor by the same factor. This 
self-similarity is exact for the dark matter (and assuming Q{z) ~ 1). For gas it is broken on small 
scales by the hydrodynamic effects of the temperature. We will be neglecting here any changes 
in the relation between the flux and the mass power spectra that are due to a difference in the 
effects of the gas temperature at different rcdshifts. Any such changes should probably be highly 
dependent on the heating mechanisms in the IGM, affected by the model of the ionizing sources 
and of reionization that is adopted (see Miralda-Escude &: Rees 1994; Hui &; Gnedin 1997). 

We start by comparing each one of the three simulation outputs at redshifts Zsi^n — 2, 3, 4 
to each of the three redshift bins in the data, Zobs = 2.41,3.00,3.89. For each one of the nine 
comparisons, we go through the following steps in order to compute a statistic measuring the 
degree to which the simulation is consistent with the observations: 

1. We set the mean transmitted flux F in the simulation to match the F of the observation, 
by rescaling the optical depth. This is an important step because, as we have seen in §5.3, the 
amplitude of the flux power spectrum is highly sensitive to F. 

2. We fit the parameters Ap{kp) and Vc (defined in §5.2) to the simulated power spectrum 
using equation (4), keeping n and F fixed to the values of the initial power spectrum assumed in 
the simulation. 
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3. We obtain the dimensionless quantity kPpik) from the fit to the simulated power spectrum 

at redshift Zgim (expressing k in units of the Hubble velocity), and transform it to redshift Zof,^ by 
rescaling k by the factor [H {zsim) / H [zobs)] [(1 + -2ofes)/(l + Zsirn)]-, in order to compare the power 
spectra of the observations and simulations at the same comoving scale. 

4. We compute the statistic from the difference between the observed power spectrum and 
the fit to the simulation at every k bin of the observational data, up to kmax = 0.032 km s~^ for all 
redshift bins, using the error bars of the observations. 

In Figure 9 we compare the fits to the power spectra from all three simulation outputs to the 
observed Ppik) in all three redshift bins. The values of x^/i' for all of these comparisons (where 
u is the number of degrees of freedom) are listed in Table 6. It is clear from Figure 9 that the 
amplitude of the power spectrum needs to be decreased in order to match the observations at large 
scales. From Table 6, we can see, for example, that the observations at Zof,s = 3 are consistent with 
the simulation at Zgim = 4, but inconsistent at Zgim = 3; this yields an upper bound of cig < 0.79 for 
the power spectrum normalization of the ACDM model of the simulation, since the amplitude has 
to be lower than that used in the simulation. At the same time, the observation at Zgbs = 2.41 is 
significantly better fit by the simulation output at Zgim = 3 than the one at Zsim = 4; to within the 
significance level implied by the difference in the in Table 6, this implies a lower limit ag > 0.54. 

Because the amplitudes of the simulation outputs appear to bracket the observations, the 
obvious next step is to interpolate between the simulation outputs to obtain a best fit value for the 

power spectrum amplitude. For this purpose, we fit the power-law A'jp{kp) = C a°j^ to the three 
values of the amplitude Ap(kp) at Zgim = 2,3,4 (where fflsim = (1 + ^sim)^^) listed in Table 5, 
which were obtained by fitting equation (4) to the simulated power spectrum in §5.2. We do this 
for all three values of the mean transmitted flux at the three redshift bins of the data. The 
results are given in Table 7. 

To find the best fit for the power spectrum amplitude, we now fit the observational data at each 
redshift bin with the two parameters A|,(fcp) and Vc, but expressing the amplitude of the flux power 
spectrum in terms of aobs/O'sim, where asim is determined from the power-law flt A'^{kp) = C a^j^ 
with the parameters in Table 7. Figure 10 shows the value of Ax^ as a function of aohs/o-simi 
when we let Vc vary, for each redshift bin Zq^s (Ax^ is equal to the x^ function minus its minimum 
value). The solid thick line is the sum of all three x^ functions (subtracting the minimum value), 
thereby yielding the value and error of the amplitude when it is required to be the same at all 
three redshift bins. The result is aohs/o-sim = 0.856 it 0.042; this is equal to the factor by which 
we need to multiply the initial mass fluctuations in the simulation to obtain the best match to the 
observations. The error bar is increased to aobs/o-sim = 0.856 it 0.052 when including the error in 
the determination of the power spectrum amplitude from the simulation (sec Table 5, we use the 
error on the z = 2>, F = 0.684 amplitude), determined as explained in §5.2. The x^ value of the 
power spectrum fit of all three redshift bins combined, x^ = 35 with 31 degrees of freedom (35 data 
points minus four free parameters), should be exceeded randomly 28% of the time. This implies 
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Fig. 9. — The observed power spectrum compared to the fitted simulation outputs for z = (4, 3, 2) 
represented by {solid, long-dashed, dashed) fines. The simulated power spectra are computed after 
F has been fixed to match the observation that they are being compared to. The dotted line is a 
fit to the observational data. 
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Table 6. Values of jv for direct comparisons between the observational data points and fits to 

power spectrum in the simulation. 



^ohs 








3.89 


1.8 


3.6 


7.6 


3.00 


1.7 


4.1 


7.2 


2.41 


2.3 


1.8 


1.6 



Note. — Each comparison uses only points with k < 0.032 (kms ) ^ and has v = 11. 



Table 7. Parameters for interpolation between the power spectrum amplitudes at different 

redshifts in the simulation. 









F 


C{F) 


a{F) 


0.475 


0.0513 


0.735 


0.684 


0.0440 


0.74 


0.818 


0.0288 


0.670 



Note. — The interpolation formula is A|,(A;p) = C(F)(a5j„/0.25)'*(^). 
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Fig. 10. — Fits comparing the interpolated amplitude in the simulation to the amplitude of the 
observations. The thin lines show Ax^ for Zq^s = (3.89,3.00,2.41) [solid, long-dashed, short- dashed). 
The thick line shows the combination of all three redshifts. 
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that our observational data are perfectly consistent with the redshift evolution of the flux power 
spectrum predicted by the simulation, and with the shape in equation (4) when Vc is allowed to 
vary. 

To conclude, our result is that the rms amplitude of the initial fluctuations of the ACDM model 
in the simulation we use should be reduced by the factor 0.856 ib 0.052, or to erg = 0.68 it 0.04. A 
possible modeling error in the determination of this amplitude is that the flux power spectrum is 
also affected by the temperature distribution of the gas, and we are relying on the temperatures 
given by the simulation for our measurement of the amplitude. For example, if a temperature- 
density relation T oc p^~^ is followed, the neutral density varies with density as n oc ^2-0.7(7-1) 
photoionization equilibrium, so the relation between the flux and mass power spectra depends on 
7-1. 

We have assumed in our analysis that the error bars on the measured power spectrum points 
are uncorrelated. Wc have verified this by re-running the amplitude fitting procedure described 
above on 100 bootstrap realizations of the observed power spectrum for each redshift bin. We 
measure the dispersion in the best fit values of aobs/o-sim and find (±0.054, ±0.076, ±0.168) for 
z = (4,3,2). The errors estimated using Ax^ = 1 in Figure 10 are (±0.07, ±0.06, ±0.11). The 
errors at lower redshift are probably becoming more correlated, as expected given the increased 
non-linearity, but the effect is still not large. We compute a weighted mean and error for aobs/csim 
using these error bars and find aobs/f^sim = 0.845 ± 0.043 while we found, dobs / ^sim — 

0.856 ±0.042 

above. The difference is clearly not important for our current level of precision, but it may become 
important when a larger sample of quasars is available. 

5.5. The Slope of the Power Spectrum 

We now show that the slope parameter n of the power spectrum we measure from both the 
simulation and the observations (by fitting equation [4]) is consistent with the linear theory, real 
space power spectrum used to set the initial conditions of the simulation. 

We start analyzing the spectra of the numerical simulation. In Figure 11, the Ax^ value is 
plotted as a function of n, where Ap and Vc are free parameters at each redshift. The parameter 
T(z) in the power spectrum formula is still fixed to the model assumed in the simulation. The 
result is consistent with n = 0.95 at all three redshift bins. The best fit is n = 0.93 ± 0.07. We 
use the error bar given by Ax^ = 1 for the z = 3 output only because the three curves are largely 
evolved versions of each other, so the values of n obtained from different simulation outputs are 
not independent. It is especially reassuring to see that the widely different mean flux decrements 
at the different redshifts do not seem to have any systematic effect on the slope. 

In Figure 12 we present the same Ax^ functions for the observations. We again allow Ap and 
Vc to vary independently at each redshift (six free parameters) but fix r{z) to the value given by 
the simulation parameters. We find an overall best fit value of n = 0.96 ± 0.08 with x^/z^ = 1.13 
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Fig. 11. — Fits to the power spectrum in the simulation with varying n. The hnear theory value of 
n in the initial conditions was n = 0.95. The thin lines are Ax^ = ~ Xmin ^'-'^ ^ ~ 2) {solid, 
long-dashed, short-dashed lines). The thick solid line is the sum of the three redshifts. At each 
redshift the mean flux decrement was fixed to the value of the nearest observational redshift bin. 
For each redshift the amplitude and a ID cutoff were fitted independently. Note that the statistical 
error on the combined result is larger than A^^ indicates because the three sets of errors are not 
independent. The best fit value is n = 0.93 ± 0.07 where the error ±0.07 is the error for the z = 3 
output alone, not the sum of the three. The best fits have = (2-7, 1.6, 0.16) for z = (4, 3, 2) and 
u = (4, 4, 2). The probability for the z = 4 value of is only 3% but this is probably reasonable 
because we expect that the error bars on the simulation are somewhat underestimated (see text). 
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Fig. 12. — Fits to the power spectrum of the observations with varying n. The thin hnes are 

= ~ Xmin ^ — (3.81,3.0,2.41) {solid, long-dashed, short-dashed lines). The thick sohd 
hne is the sum of the three rcdshifts. For each redshift the amphtude and a ID cutoff were fitted 
independently. Tlic best fit value overall is n = 0.96 ± 0.08 with x^/i" = 1-13 for = 28. The 
simulation measurement should be used as a correction to give a final result n = 0.98 ± 0.11. 
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for u = 28 degrees of freedom. 

Since our test of the slope fitting procedure using the simulation (Figure 11) yielded n = 
0.93 lb 0.07, when the true linear theory slope of the model was n = 0.95, we shall assume that 
the same offset An = —0.02 it 0.07 should be present in the observations. We use this offset as 
a correction to the observational slope and add the error in quadrature. The final result for the 
estimated linear theory slope is then n = 0.98 it 0.11. Note that this result assumes that the 
deviation from a power-law for the power spectrum of the real universe matches the deviation in 
the KCDM model of our simulation. 



5.6. Joint Fit for the Amplitude and Slope 

The result for the slope in §5.5 was obtained independently in the three redshift bins, without 
requiring the amplitudes to follow the evolutionary law of Table 7. In fact, generally it would not 
be valid to use the values of C{F) and a{F) in this Table when the initial power spectrum does not 
have the same slope as in the simulation. However, the result we found for the slope is very close to 
that of the simulation. This justifies a joint fit for the amplitude and slope using the interpolation 
constants in Table 7. 

The joint fit applied to the simulation yields aobs/o,sim = Ij by definition, and n = 0.94 it 0.04. 
This result is close to that found in §5.5, with a smaller error bar mostly because this is now a 
combined fit to all three redshifts. For the same fit to the observations, the amplitude is aohslo-sim = 
0.85 ± 0.04 and the slope is n = 0.92 it 0.07. The reduction by two in the number of free parameters 
(since the amplitudes at the three redshifts are now tied by the formula from Table 7) is accompanied 
by an increase of 3.3 in for the best fit to the observations. Including the correction and error 
from the fit to the slope in the simulation (An = —0.01 it 0.07 using the error bar from only one 
redshift output) gives a final result n = 0.93 it 0.10. 

The value of k we use to specify the amplitude, kp = 0.04 (kms~^)~^ at z = 3, was chosen 
to make the error bars on the slope and amplitude independent for this measurement. Note that 
this pivot k is five times larger than the pivot used by Croft et al. (1999) because we have used 
data on smaller scales and also weighted the fits using the statistical error bars instead of the more 
conservative weighting (that favored large scales) used by Croft et al. (1999). The fact that we 
use the ID power spectrum for fitting also contributes to the large value of our kp, because a data 
point PiD{k) is an integral over the 3D power at wave numbers larger than k. We are able to take 
advantage of our higher resolution data because we also have a higher resolution simulation to test 
the procedure. However, these errors are statistical only, and do not include systematic errors due 
to the way we model the flux power spectrum as a function of the mass power spectrum based on 
the simulation used here. 

Our result is that the observational data appears to favor a value of P{kp) at kp = 0.04 ( km s~^) 
that is 0.85 ± 0.05 times the amplitude of the simulation (after adding the error in the deter- 
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mination of the simulation amplitude). Computing the value of the linear theory power in the 
simulation at kp = 0.04 (kms^^)~^ and z = 3 we find that our result corresponds to P^Dik = 
0.04 (kms~^)-\ z = 3] = (2.2±0.3) x 10^ (kms^^)^ or equivalently Af^lk = 0.04 (kms"^)-\ z = 
3] = 0.72 lb 0.09 where A^(/c) is the contribution to the mass density variance per unit interval 
in In A;. Croft et al. (1999) measured A^lk = 0.008 (kms'^)"!, z = 2.5] = O-STtoH- Using our 
amplitude measurement at kp and our measurement of the slope relative to the simulation we find 
Al[k = 0.008 (knis"^)-\ z = 2.5] = 0.32 ib 0.07. The two results are consistent, differing by 
~ 1.3 a. Note that the change in slope in the CDM power spectrum, between the two pivot points, 
is important for this comparison. 

We have discussed the slope of the power spectrum in terms of the large scale asymptotic slope 
n because it is simple to relate this number to the simulation initial conditions and cosmology 
in general. Of course we are really measuring an effective value of the slope on the scale of the 
Lya forest. We define the slope parameter Up to be the slope at kp = 0.04 (kms"''^)"-'^ at z = 3. 
Using equation (3), it is straightforward to compute Up for a given n. For the value of T{z) used 
in the fits, and with n = 0.95, we find Up = —2.53. The observational result, n = 0.93 ib 0.10 (or 
n = 0.98 lb 0.11 using the more conservative method in §5.5), corresponds to Up = —2.55 zb 0.10 
(or Up = —2.50 lb 0.11). For the pivot point that Croft et al. (1999) used, z = 2.5 and k = 
0.008 (kms"^)-^ the slope of our fitting function is -2.22 for n = 0.95 so our result corresponds to 
-2.24 (-2.19) while they found -2.25. For measuring the slope at kp, using the CDM power spectrum 
shape is not crucially important; however, the extrapolation we use to compare to the Croft et al. 
(1999) result is only valid if the CDM power spectrum correctly describes the shape of the power 
spectrum in the real universe. 

5.7. The A|i — A.'^^^^g Relation and Possible Modeling Errors in our Result 

The ratio of the flux fluctuation amplitude (from Tabic 5) to the linearly extrapolated mass 
fluctuation amplitude, B = [A'j^/ A'^^^g^]^/'^ , is given in Table 8. This ratio can be thought of as the 
"bias" of the Lya forest. However, this factor is not the bias to be applied in linear theory peculiar 



Table 8. Ratio of flux fluctuations to linearly extrapolated mass fluctuations in simulation, 

B = [Al{kp)/Al,,,{kp)]y\ 





B{F = 0.475) 


B{F = 0.684) 


B{F = 0.818) 


4 


0.264 


0.243 


0.198 


3 


0.228 


0.212 


0.172 


2 


0.191 


0.177 


0.142 
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velocity calculations, because in the Lya forest a non-linear mapping is applied to the redshift space 
optical depth field to obtain the flux. 

The results in Table 8 should allow for an easy comparison with other numerical simulations 
of the quantity wc need to obtain the mass power spectrum amplitude from observations. One of 
the main uncertainties in the B factor arises from the temperature-density relation, which affects 
the relation between neutral density and baryon density: nni oc a{T) oc ^'^ ^\ where 

A5 is the baryon density divided by its mean value, and we have assumed T oc A^""*^. Increasing 
7 decreases the optical depth fluctuations for fixed baryon density fluctuations. The theoretical 
expectation is 0.0 < 7 — 1 < 0.6 (Hui & Gnedin 1997), and our simulation falls roughly in the 
center of this range, giving a potential error of ~ 10% in 2 — 0.7 (7 — 1). A change in thermal 
broadening also changes the amplitude of flux fluctuations in a more complicated way by changing 
the weighting of different densities in the spectra. 



6. THE CORRELATION FUNCTION 

We now discuss the correlation function S,{Av) = {dF{v)6F{v + Au)). It is straightforward 
to measure ^ by directly averaging over the pixels in the spectra. The results are shown in Table 9. 
It is important to recognize that the error bars we give here are strongly correlated and we include 
them only to give a qualitative idea how large they are. Any statistical comparisons with data should 
use the full error covariance matrix, which is available in the website given earlier. In Figures 
13(a,b,c) we show the correlation function measured from the observations and the simulation. The 
observation and simulation do not agree well in spite of the good agreement between the power 
spectra shown in Figure 5. The reason, as we will show below, is that the correlation function 
computed from the simulation is strongly affected by the finite box size. Figure 14 shows ^ plotted 
for each of the observed redshift bins. For this figure each correlation function has been divided by 
the variance, ap, at the same redshift, so the curves are all normalized to ^(0) = 1. There is no 
clear difference between the correlation lengths at the three redshifts. 

Finally we show the effect of the box size on the correlation function from the simulation in 
Figure 15. We first fit to the power spectrum of the simulation at z = 3 using the linear theory 
function with a 3D cutoff instead of our usual ID cutoff. We use the 3D cutoff because this function 
fits the full range of k, including the small scales. When we Fourier transform the analytic power 
spectrum using the parameters that fit the simulation, the correlation function obtained is quite 
different from the one measured directly from the simulation. In fact, the agreement in Figure 
13 between the observed and simulated ^(Av) as Av ^ is only a coincidence: the observed 
correlation function is in fact smaller than in the simulation model, but the correlation function in 
the simulation is lowered by the finite box size effects to a value close to the observed one. Figure 
13 also shows that we can approximately recover the correlation function that is measured from 
the simulation by summing over the analytic power spectrum using only k values of discrete modes 
that are actually included in the simulation box. 
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Table 9. The observed flux correlation function. 







C(Ai;,z = 3.89) 


C(Ai;,z = 3.0) 


C(Ai;,z = 2.41) 


(kms-^) 


(kms-^) 








7 


10.5 


0.1230 ± 0.0031 


0.1132 ±0.0055 


0.0767 ± 0.0069 


14 


17.5 


0.1142 ± 0.0032 


0.1071 ±0.0055 


0.0721 ± 0.0067 


21 


24.5 


0.1039 ± 0.0035 


0.0993 ± 0.0055 


0.0667 ± 0.0065 


28 


31.5 


0.0936 ± 0.0038 


0.0911 ± 0.0055 


0.0609 ± 0.0062 


35 


39.53 


0.0827 ± 0.0040 


0.0820 ± 0.0056 


0.0544 ± 0.0059 


44.06 


49.76 


0.0710 ±0.0043 


0.0717 ±0.0057 


0.0469 ± 0.0056 


55.47 


62.65 


0.0595 ± 0.0046 


0.0613 ± 0.0059 


0.0390 ± 0.0051 


69.83 


78.87 


0.0484 ± 0.0048 


0.0514 ± 0.0061 


0.0313 ± 0.0047 


87.92 


99.29 


0.0380 ± 0.0049 


0.0420 ± 0.0062 


0.0240 ± 0.0041 


110.7 


125 


0.0294 ± 0.0048 


0.0336 ± 0.0062 


0.0178 ± 0.0035 


139.3 


157.4 


0.0211 ± 0.0047 


0.0275 ± 0.0056 


0.0120 ± 0.0031 


175.4 


198.1 


0.0143 ±0.0047 


0.0227 ± 0.0048 


0.0069 ± 0.0026 


220.8 


249.4 


0.0095 ± 0.0049 


0.0177 ±0.0045 


0.0037 ± 0.0025 


278 


314 


0.0064 ± 0.0046 


0.0104 ± 0.0041 


0.0004 ± 0.0023 


350 


395.3 


0.0047 ± 0.0048 


0.0078 ± 0.0037 


-0.0023 ± 0.0020 


440.6 


497.6 


-0.0010 ±0.0055 


0.0072 ± 0.0035 


-0.0003 ± 0.0024 


554.7 


626.4 


0.0027 ±0.0058 


0.0029 ± 0.0035 


0.0014 ± 0.0028 


698.3 


788.5 


0.0060 ± 0.0054 


0.0033 ± 0.0035 


0.0025 ± 0.0017 


879.2 


992.7 


0.0059 ± 0.0038 


0.0060 ± 0.0032 


0.0005 ± 0.0017 


1107 


1250 


-0.0007 ± 0.0040 


0.0051 ± 0.0045 


0.0008 ± 0.0017 


1393 


1573 


-0.0014 ± 0.0043 


-0.0060 ± 0.0041 


-0.0005 ± 0.0019 


1754 


1980 


-0.0042 ± 0.0033 


-0.0005 ± 0.0039 


-0.0019 ±0.0016 


2208 


2493 


0.0013 ± 0.0033 


-0.0028 ± 0.0032 


0.0002 ± 0.0018 



Note. — Averaged over bins defined by the velocity separation Avmin^ with mean velocity 
separation Avmean (the maximum velocity separation for the final bin is Avmax = 2780 kms~^). 
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Fig. 13. — The observed correlation function of i*" as a function of velocity separation along the 
line of sight (points with error bars). The definition of the correlation function is: ^(A?;) = 
{dF(v)6F{v + Ai;)) where 5F = F — F. The error bars on Av show the bins used to average 
the correlation function. The error bars on ^ are too highly correlated to use for statistical analysis 
without accounting for off-diagonal terms in the error covariance matrix. The dotted line shows 
the correlation function when the possible metal line regions are included in the computation. The 
solid line shows the correlation function from the simulation, which is strongly influenced by the 
finite box size, (a) shows z = 3.89, (b) shows z = 3.00, and (c) shows z = 2.41. 
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(b) z„,,=3.00, z^^=3 
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Fig. 14. — Comparison of the correlation functions for z = 3.89 {solid line), z = 3.0 {long-dashed 
line), and z = 2.41 {short-dashed line). The amphtude of each curve has been divided by o'p{z) 
so that ^(0) = 1.0 for each. The thin dotted hnes connect the error bars on the z = 3 correlation 
function. 
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Fig. 15. — The effect of the finite box size of the simulation. The points with error bars show the 
observed ^ for reference. The sohd hne shows ^ measured from the simulation. The dashed line 
shows ^ computed by integrating an analytic power spectrum with parameters determined by a fit 
to the power spectrum measured from the simulation. The fit uses a Gaussian cutoff on the linear 
theory 3D power spectrum and fits the entire range of k fairly well. The dotted line shows the 
correlation function calculated from the same analytic power spectrum, but here it is computed 
by summing over the ID modes actually in the box instead of integrating over all k. There is a 
discrepancy between the solid and dotted lines because the mean along each line of sight through 
the simulation can vary from the overall mean and the dotted curve is computed from a fit to the 
simulation power spectrum rather than the power spectrum itself. 
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7. DISCUSSION 

We have presented an observational determination of the PDF, the power spectrum, and the 
correlation function of the transmitted flux in the Lya forest, using a sample of eight quasars over 
the redshift range 2.1 < z < 4.4. All our results are given with error bars that include the variance 
of our sample of eight quasars. These results can be compared to the predictions of any cosmological 
simulations of a given theory, after instrumental resolution and noise with the same amplitude as 
in our data are added to the sinmlatcd spectra. Any future observational determinations of the 
same quantities from other data sets will also be easily comparable. 

In our opinion, there are two main objectives in the accurate measurement of the statistics 
of the one-dimensional random field of the transmitted flux in the Lya forest. The first is to 
test the idea that the Lya forest is entirely explained by an IGM that is photoionized and heated 
by the known sources of ionizing radiation (AGNs and galaxies), which evolves gravitationally as 
described by a large-scale structure theory based on the presence of cold dark matter and adiabatic, 
Gaussian primordial fluctuations. The second objective is to measure some of the parameters of 
the large-scale structure model from the Lya forest observations. 

Even though there are a large number of parameters required to fully describe the cosmological 
and large-scale structure model, and the population of sources of ionizing radiation, only certain 
combinations of these parameters are important to determine the observable properties of the Lya 
forest to a first approximation. A simple intuitive understanding of what these parameters are 
can be developed by considering a simplified model where the power spectrum of fluctuations is 
approximated as a power-law, and the IGM follows a simple density-temperature relation T = 
Tq{p/ p)'^~^ . In this case, the temperature Tq determines a Jeans scale, and all the statistical 
properties of the Lya optical depth should depend only on the amplitude of the density fluctuations 
at the Jeans scale, uj, the effective power-law index near the Jeans scale nj, and 7 (it is understood 
here that all these parameters depend on redshift for any given model). In addition, varying the 

temperature Tq (while keeping crj, nj and 7 fixed) should cause a rescaling of all the statistical 

1/2 

properties of the Lya forest in velocity (proportional to the Jeans scale, or Tq ), and varying the 
parameter p (defined in eq. 1) should cause a rescaling of the optical depth. Therefore, in this 
simplified model the Lya forest depends on five parameters only, and the changes under two of 
them are trivially obtained by rescalings. Only aj and nj contain information on the large-scale 
structure model; Tq and 7 depend on how the IGM was reionized (e.g., Miralda-Escude & Rees 
1994; Hui &; Gnedin 1997) as well as the power spectrum (through the effects of shock-heating), 
and /X depends on O;,, H{z), and the intensity of the ionizing background. 

More exact predictions for the Lya forest obtained in hydrodynamic simulations will depend 
on more parameters, due to effects that are neglected in the simplified model just discussed. The 
true distribution of density and temperature of the gas has of course a scatter around a mean 
relationship; this mean relationship also deviates from a power-law. This distribution depends 
on the ratio of the cooling rate to the Hubble rate, which is proportional to the new parameter 
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Ql,h'^ / H (z) . In addition, the effects of cooling imply that changing Tq will no longer result in a 

simple rescaling in velocity of all statistical properties of the Lya forest, since the cooling rate has 
a complex dependence on temperature that breaks the self-similar scaling. These effects should 
be important only at high densities, where cooling is important; at low densities, the thermal 
evolution of the IGM naturally produces the "Hui-Gnedin relation" (Hui &; Gnedin 1997). The 
value of ^{z) can also change the properties of the density field under gravitational evolution, but 
this effect should be small because Q{z) is very close to unity at high rcdshift. The evolution of Tq 
with redshift, and the baryon fraction {Qi,/^}) can also affect the Jeans scale and the gravitational 
evolution of the photoionized gas (Gnedin & Hui 1998). These other parameters are likely to cause 
only small changes in observable properties compared to the five main parameters given above. 

We have presented here measurements of some of the principal parameters determining the 
statistical properties of the Lya forest. The mean transmitted flux determines the parameter /x, and 
the flux power spectrum can yield the amplitude and the effective slope of the initial mass power 
spectrum. We have carefully computed the error bars due to sample variance for all our results, 
which are generally consistent with previous ones, although the amplitude of the power spectrum 
is found to be slightly lower than in (Croft et al. 1999). Both the flux PDF and the power 
spectrum predicted in the numerical simulations in Miralda-Escude et al. (1996) agree extremely 
well with what is observed. The main limitation we have for the accuracy of this comparison are 
the uncertainties introduced by the continuum fltting of the flux, and by the presence of narrow 
metal lines. 

Our results for the measurement of the initial mass density perturbations are given in terms 
of the amplitude and power law index (i.e., the slope on a plot of lnP(A;) vs. Ink) of the linearly 
extrapolated primordial power spectrum at kp = 0.04 (kms~^)~^ at z = 3. The amplitude is 
A2[/c = 0.04 (kms"^)-i, z = 3] = 0.72 ±0.09 where A^ik) is the contribution to the mass density 
variance per unit interval in Ink. The slope is Up = —2.55 it 0.10. This amplitude result, with its 
small error bar, must be thought of as partially preliminary. Further study is needed to carefully 
quantify the effects on the A|. — A^^^^ relation of changing the temperature-density relation or 
details of the simulations. Croft et al. (1998, 1999) found no important systematic errors but this 
was in the context of larger statistical error bars, and their set of simulations was not exhaustive. 

If it proves to be robust, the power spectrum result can strongly constrain potential cosmo- 
logical and galaxy formation models. For example, McDonald &; Miralda-Escude (1999b) found 

that, in order to produce the high velocities observed in studies of the kinematics of damped Lya 
systems (Prochaska &: Wolfe 1997, 1998), a model must obey o"£)i^4 > 0.78 (95% confidence), 
where (Tdlaa is the linear theory rms fluctuations on spheres of radius 100 kms^^ at z = 4. The 
Lya forest power spectrum amplitude we have derived in this paper implies aDLA4 = 0.67 ± 0.04, 
inconsistent with the above upper limit derived from damped Lya systems at the 2.5cr level. This 
discrepancy, if conflrmed in the future, might simply be due to a different relation between the ob- 
served velocity dispersions in damped Lya systems and the velocity dispersions of their host dark 
matter halos from the one that was calculated in the spherical equilibrium models in McDonald 
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& Miralda-Escude (1999b), but it should in any case provide a new constraint to understand the 
dynamics of damped Lya systems. 

We will also present additional results on the measurement of the parameters Tq and 7 in 
work that is now in preparation (P. McDonald et al. , in preparation). These parameters can be 
determined from the shape of the power spectrum at small scales, or by fitting absorption features to 
Voigt profiles and comparing the distribution of Doppler widths to what is obtained in simulations 
(Schaye et al. 1999; Ricotti et al. 1999; Bryan & Machacek 1999). 

Apart from measuring these parameters, one should also strive to find tests for the validity 
of the general framework of the gravitational evolution theory in CDM models of the Lya forest. 
Various perturbations from this ideal theory should be present at some level. Reionization smoothes 
the gas fluctuations at scales smaller than the Jeans scale for photoionized gas, but dense structures 
near the Jeans scale will not be ionized, heated and destroyed for a long time (Bond et al. 1988). For 
example, a virializcd halo with velocity dispersion ~ lOkms"^ should not accrete much gas when 
forming after reionization (since the gas has too much initial entropy and is not able to cool), but 
having formed before reionization it may keep its gas in thermal and hydrodynamic equilibrium, 
as in the minihalo model (Rees 1986; Abel & Mo 1998). These objects might introduce subtle 
changes in the Lycc forest that would depend on when and how the IGM was reionized. At the 
same time, when He II is reionized, the rapid heating of the IGM can be inhomogeneous on large 
scales, resulting in new fluctuations in the Lya forest due to the gas temperature that depend on 
the luminosity function and spatial distribution of the ionizing sources (Miralda-Escude &; Rees 
1994). Outflows from AGNs, radio sources and starbursts may also perturb the IGM at some 
level. Finally, the large-scale structure theory itself may be modifled (for example, by having 
non-Gaussian primordial fluctuations). A continued effort in comparing the observed spectra with 
detailed numerical simulations, Tising a diversity of statistical probes and methods, is required to 
search for the influence of these phenomena. 

We thank David Weinberg for his useful comments on an early draft of this paper. 

A. GENERATION OF MOCK SPECTRA 

To check various aspects of our data analysis procedure it is useful to have a method for 
creating mock spectra with properties similar to the observed ones. For example, we test our code 
for computing power spectra from the observations by using it to compute the known Pp^k) of the 
mock spectra. We also test our method for generating error bars by comparing the bootstrap errors 
estimated from single random spectra to the ideal results obtained by generating large numbers of 
independent random spectra (see below). 

The procedure for generating random spectra consists of the following steps: 
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1. Fit an analytic function to the observed Ppik). 

2. Generate a Gaussian random realization of the fitted Pf(^) on a large, fine grid. 

3. Map the values from the grid points onto the observed pixel coordinates Awj. 

4. Add noise to each pixel with amplitude given by the mean of the observations. 

In step 1 we fit the observed Ppik) using Equation (3) multiplied by a 3D Gaussian smooth- 
ing function instead of the ID smoothing function wc generally use. We use this function because 
it provides a good qualitative fit to the observed Pf(^) over close to the full range of k. We are not 
concerned with obtaining the best possible fit because we only want to generate mock spectra with 
a precisely known input power spectrum that will have properties similar to the observed one. In 
step 2 we generate a Gaussian random field with Fourier mode amplitudes given by (a^) = Ppik) 
where Ppik) is given by the fitting formula from step 1. The grid used is evenly spaced in velocity 
and is always longer than the observed spectrum by a factor of at least 8 with resolution better 
by a factor of at least 4. In step 3 we superimpose the uneven set of observed pixel locations on 
top of the grid generated in step 2 (using a random origin). The value of Fi assigned to each Avi 
is linearly interpolated from the two adjacent values on the fine grid. The spectra generated by 
this procedure do not have identical statistics to the observations because the observations are not 
Gaussian but this should not be important for our purposes. 

Although the random fields that we generate have the same power spectrum as the observed 
spectra, they have a Gaussian PDF which is not limited to the range < F < 1. It should be 
possible to create more realistic spectra by generating a Gaussian (or even non-Gaussian) field for 
the optical depth or the baryon density instead of the transmitted flux (e.g., Bi 1993). Wc have 
adopted the very simple method of generating a Gaussian field and calling it the transmitted fiux 
because this allows us to use directly the power spectrum measured from the observed transmitted 
flux. 

B. BOOTSTRAP ERROR BARS 

It is well known that gravitational collapse leads to non-Gaussianity of the density field in the 
non-linear regime probed by the smallest scales in the Lya forest. It is generally difl&cult to estimate 
error bars on measurements when we cannot assume that the underlying process is Gaussian. To 
solve this problem we compute error bars for all of the numbers measured in this paper using a 
variation of the bootstrap method (Press et al. 1992). 

If wc have N independent data segments, a bootstrap realization of a computed statistic is 
created by randomly drawing N segments from the set with replacement and recomputing the 
statistic from the randomly modified set of data. The bootstrap error bar on a statistic is simply 
the dispersion among the bootstrap realizations of the statistic. Ideally we would like to divide the 
data sample into numerous independent parts with identical statistical properties as we could if we 
had a large number of quasars all at the same redshift for example. Unfortunately we only have 
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8 quasars and no more than 5 in a single redshift bin. One would think that because the spectra 
arc long relative to the correlation length they can be split into some number of approximately 
independent chunks and the bootstrap method applied to those chunks. The question then is how 
large must these chunks be relative to the correlation length? We can analytically investigate the 
applicability of the bootstrap method for the situation at hand in the special case where we assume 
the field is Gaussian. The bootstrap method does not depend on the field being Gaussian so we 
can hope that if it works in the Gaussian case it will work similarly well on observational data with 
the same power spectrum. 

We now investigate analytically the requirements for obtaining reliable bootstrap errors in the 
case where we are segmenting correlated data. We represent an abstract set of data by Si where 
i = 1..N runs over all N pixels. To make analytic calculations possible we assume (5 is a Gaussian 
random field with {6) = 0, ((5^) = cr^ = ^(0), and {SiSj) = — j). We want to imagine the 
situation where the data has been split into N/c segments each containing c pixels. If / is some 
statistic that can be measured from the data (like the mean), we define / to be the value of / 
that is actually computed from a given set of data, and / to be the value of / computed from a 
bootstrap realization of the same data used to compute /. Taking the variance as an example we 
have 

1=1 

and 



-1 



sc+c 



s=0 i=sc+l 

where Og is the number of times that data segment s was drawn in forming the bootstrap realization. 
The occupation numbers Oj have the following property 

((„,-l)fe-l)>, = i,(-^)--^ (B3) 

where represents an average over different bootstrap realizations of the same data, Ng = N/c is 
the number of segments drawn from, and Sij = 1 for i = j and 6ij = for i ^ j. We will generally 
use the large Ng approximation 

{{oi - 1) {oj - 1)), = 5ij - ^ . (B4) 

To demonstrate the application of this formalism we will compute the bootstrap error on 
cj^. To make the algebra more transparent we will further simplify the calculation by making the 
assumption that the pixels are uncorrelated and the segment length is only one pixel: 
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Evaluating the expectation value over bootstrap realizations gives 

2\ ^ N N . V 

/ b j=i ^/^i \ / 

In this simplified case it is easy to find the true error on from the assumption that the field is 
Gaussian. The result is 

To relate this to the bootstrap error we must take the expectation value of Equation (B6) over 
diflFerent realizations of the underlying Gaussian distribution to obtain 



/ &/ j i j 

where we have completed all of the sums using (SiSj) = ^{i — j) = Sij and dropped terms of higher 
order in iV~^. We see that the correct result is obtained in the large N limit. 

The error on the correlation function is more complicated and we will not assume uncorrelated 
pixels or one-pixel segments. The exact result for a Gaussian field in the limit that the data set is 
much longer than the correlation length is 

1 ^ 

(^^n — ^nj (^Cm — ^rnj ) ~ ^ {^k^k+n-m + ^k-m^k+n) (B9) 

k=-N 

where we have also assumed that n and m are much smaller than the length of the data set. A 
bootstrap realization of ^ is given by 

1 c SC+C 

s=0 i=sc+l 

After evaluating the bootstrap average we find 

-1^-1 



.Y -, JV 

SC+C s'c+c 



(^n-Cn) (?m-^m))^ = -^ XI XI Yl Yl Si6i+nSjSj+m (Sgs' - ■ (Bll) 

s=0 s'=0 i=sc+l i'=s'c+l 

If we make the assumption that the data set is long enough that it can be split into many segments 
with each segment much longer than the correlation length the calculation of the bootstrap errors 
averaged over realizations of the Gaussian field gives the correct result: 

c 

- (Cm - Cm) ^ jy ^ ('CfcCfc+n-m + Ck-m^k+n) ■ (^12) 

k=—c 

The assumptions we were forced to make so that the bootstrap error came out correct seem to 
be the ones that would be expected intuitively: the data set must be split into a large number of 
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segments and those segments must be larger than the correlation length of the data. Note that 
cross correlations for widely separated n and m will not be accurately reproduced but only if they 
are so widely separated that the cross correlation is in fact very small. We have really only shown 
that the bootstrap method can be made to work in the infinite data limit. 

To investigate the effectiveness of the bootstrap method applied to real data we turn to the 
mock spectra described in Appendix A. We generate a set of 50 of each quasar spectrum and apply 
the data analysis procedure to them exactly as we have done with the observational data. We find 
that the optimal number of segments for our redshift bins seems to be about 300 giving segments 
approximately 100 pixels long. The errors on a^, the PDF, and ^{Av) are underestimated by 5-15% 
for the three redshift bins using this size segments. The error estimated for the mean transmitted 
flux seems to be more strongly affected by the long range correlations (underestimated by a factor 
of 1.6 in the worst case), probably because it is only first order in ^ instead of second order like we 
see in Equation (B9). 

To correct for the these underestimations we find the ratio of the true error to the bootstrap 
error in the mock spectra for every statistic we measure and then multiply the bootstrap error on 
the observational measurement by this factor. In the case of the PDF and correlation function 
we have a full error matrix including the correlations between errors on different bins. We judge 
the underestimation of the errors by looking at tests comparing the PDF or ^ for all possible 
of pairs of mock data sets. The mean value of x^/'^ ^or u degrees of freedom should be 1 and 
the full distribution of can be computed easily. We find that the mean computed using the 
bootstrap errors is no more than 20% higher than expected and we rescale the error matrix of the 
observational data by this overall factor instead of trying to correct the individual matrix elements. 
With this rescaling the distribution of becomes essentially perfect. Note that we do not in any 
way compute the observational errors from the Gaussian mock spectra since we use them only 
to find edge effect corrections that are then applied to the bootstrap errors calculated from the 
observational data. 

Although it plays a large part in the paper, in this discussion we have left out any mention 
of the power spectrum. We have not found a fully satisfactory method for computing a full error 
matrix for Ppik) using the bootstrap method. We do not get correct results using the type of 
analytic calculations we did for the correlation function. This is probably because we need to treat 
the edge effects from the finite data set more carefully in the case of the power spectrum. The 
obvious technical problem with simply applying the method to the data anyway is that we can 
not make the segments very small and still measure the power spectrum from them. We need to 
keep the segments a few times larger than the longest wavelength for which we want to measure 
P{k) but this restricts the number of them for our data set to about 10. We have tried computing 
full error matrices for our mock spectra using only 10 segments but the results are discouraging. 
The distribution of x^ computed from these matrices is not close to what is expected. This is 
because the statistical error in estimating all of the error matrix elements from only ten segments 
is too high. Note that in our mock spectra there are no correlations between the -Pf(^) bins so any 
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that we find in the bootstrap realizations are simply noise. Visually comparing the error matrices 
from the mock spectra and the real data we find that they look very similar. There is no obvious 
evidence that the errors in the real data arc more correlated than the theoretically uncorrelated 
mock data (both become strongly correlated at very high k where the window function becomes 
important). We will simply use the diagonal elements of the bootstrap error matrix and treat the 
bin errors as independent. We correct the errors on each bin separately by comparing the true 
and bootstrap errors in the mock spectra. These corrections are only significant (~ 30%) on the 
longest wavelength modes. Note that using independent bootstrap errors on each bin still takes 
into account the correlation between the individual k modes that are averaged to give the bin 
value. The measurement of the amplitude shows that at the redshifts in question the modes with 
k < 0.04 ( kms~^)~^ should not be fully non-linear supporting the supposition that the correlations 
between bins is not large. 



C. THE Lya FOREST FLUX POWER SPECTRUM IN THE LARGE SCALE 

LIMIT 

This Appendix shows that the flux power spectrum in the Lya forest must indeed be propor- 
tional to the mass power spectrum in the limit of large scales, as was first assumed by citetcwk98. 
We shall first consider the simple case where the effect of peculiar velocities is neglected. 

Let PoiF) be the fiux probability distribution, and P(F\6) be the conditional distribution of 
F given that the overdensity smoothed over a certain scale Rg around the point where the fiux 
F is observed is 5. We consider the correlation of the transmitted fiux at two points, Fi and F2, 
separated by a distance much larger than Rg, but where Rg is still a sufficiently large scale for 
linear theory to be valid. In this case, the correlation of Fi and F2 should be exclusively due to 
the correlation of Si and 62, the smoothed densities around the points 1 and 2. In other words, the 
joint probability distribution ^2(^15 ^2) should be equal to the product P{Fi\Si)P{F2\52) (notice 
that the correlation of the whole deformation tensor at the two points should enter here at the same 
order as the density correlation; we are assuming that the constrained probability distribution of 
the flux depends mostly on the overdensity, and not the other components of the deformation 
tensor). Since ^ < 1, we can write P{F\S) = Po{F) + SPs{F), where Ps{F) = [dP{F\S)/dS]g^Q. 
The flux correlation function is then given by: 



< F1F2 > 



= J dFi dF2 < [Po{Fi) + 5iPs{Fi)] [Po{F2) + 52Ps{F2)] Fi F2 > 
= < 6162 > J dFi dF2 PsiFi)Ps{F2) Fi F2 
= C{xi2) ■ j dFPs{F)F 



where ^{xi2) =< S1S2 > is the density correlation function at the separation X12 between points 1 
and 2, and F(S) = J dF F P{F,5). This gives an expression for the B factor defined in §5.7. 

When pecuhar velocities are included, there should be an additional dependence of P{F) on 
the peculiar velocity gradient along the line of sight, 77 = H~^dvp/dl (where d/dl is the derivative 
along the line of sight), averaged over the same scale Rg. We can linearize as before P{F,d,fj) = 
Po{F) + dPs{F) + fjPrj{F), and define the factor 

bp = {^j dFPsiF)F^ I [J dFP^iF)F^ . (C2) 

The quantity bp a different "bias factor" for the Lja forest, this one intended for peculiar velocity 
calculations. Defining the new variable Sz = 5 + bp^fj, we obtain 

fdpy 



where is the redshift space correlation function along the line of sight of Sz, which has bias factor 
bp- Thus, we see that both bp and B = dF/dS are functions of the flux distribution Po{F), and of 
how this distribution varies with the overdensity and the peculiar velocity gradient smoothed over a 
large scale around the point where the flux is being measured. This implies a complex dependence 
oi bp and B on all the variables determining the Lya forest properties, such as the fluctuation 
amplitude on the Jeans scale, the mean transmitted flux, and the temperature-density relation. 
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